Creating Country, State and County maps

From the example from Prof. Chris Sunderland

library(tidyverse)
## Warning: package 'dplyr' was built under R version 3.6.3
## Warning: package 'forcats' was built under R version 3.6.3
library(maps)
## Warning: package 'maps' was built under R version 3.6.3
library(mapdata)
## Warning: package 'mapdata' was built under R version 3.6.3
library(lubridate)
library(viridis)
## Warning: package 'viridis' was built under R version 3.6.3
library(wesanderson)
## Warning: package 'wesanderson' was built under R version 3.6.3

This is a map of confirmed Covid-19 cases in the world organized by county and not country. The United States is almost filled with purple because Covid-19 has been confirmed in most counties.

daily_report <- read_csv(url("https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_daily_reports/03-31-2020.csv")) %>% 
  rename(Long = "Long_") 
    
ggplot(daily_report, aes(x = Long, y = Lat, size = Confirmed/1000)) +
    borders("world", colour = NA, fill = "grey90") +
    theme_bw() +
    geom_point(shape = 21, color='purple', fill='purple', alpha = 0.5) +
    labs(title = 'World COVID-19 Confirmed cases',x = '', y = '',
        size="Cases (x1000))") +
    theme(legend.position = "right") +
    coord_fixed(ratio=1.5)
## Warning: Removed 2 rows containing missing values (geom_point).

Looking at confirmed Covid-19 cases in the United States

This is a map of confirmed Covid-19 cases in the United States, not including the United States territories. Some areas of the map are darker because of overlap among the circles.

daily_report <-   read_csv(url("https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_daily_reports/03-27-2020.csv")) %>% 
  rename(Long = "Long_") %>% 
  filter(Country_Region == "US") %>% 
  filter (!Province_State %in% c("Alaska","Hawaii", "American Samoa",
                  "Puerto Rico","Northern Mariana Islands", 
                  "Virgin Islands", "Recovered", "Guam", "Grand Princess",
                  "District of Columbia", "Diamond Princess")) %>% 
  filter(Lat > 0)
ggplot(daily_report, aes(x = Long, y = Lat, size = Confirmed/1000)) +
    borders("state", colour = "black", fill = "grey90") +
    theme_bw() +
    geom_point(shape = 21, color='purple', fill='purple', alpha = 0.5) +
    labs(title = 'COVID-19 Confirmed Cases in the US', x = '', y = '',
        size="Cases (x1000))") +
    theme(legend.position = "right") +
    coord_fixed(ratio=1.5)

Another way to graph confirmed Covid-19 cases in the United States based on an example by Anisa Dhana using the viridis palatte which is designed to be perceived by viewers with common forms of colour blindness.

This is a map of confirmed Covid-19 cases in the United States, not including the United States territories.This map is the same as above, but is color coded based on a range of confirmed Covid-19 cases.

mybreaks <- c(1, 100, 1000, 10000, 10000)
ggplot(daily_report, aes(x = Long, y = Lat, size = Confirmed)) +
    borders("state", colour = "white", fill = "grey90") +
    geom_point(aes(x=Long, y=Lat, size=Confirmed, color=Confirmed),stroke=F, alpha=0.7) +
    scale_size_continuous(name="Cases", trans="log", range=c(1,7), 
                        breaks=mybreaks, labels = c("1-99",
                        "100-999", "1,000-9,999", "10,000-99,999", "50,000+")) +
    scale_color_viridis_c(option="viridis",name="Cases",
                        trans="log", breaks=mybreaks, labels = c("1-99",
                        "100-999", "1,000-9,999", "10,000-99,999", "50,000+"))  +
# Cleaning up the graph
  
  theme_void() + 
    guides( colour = guide_legend()) +
    labs(title = "COVID-19 Confirmed Cases in the US'") +
    theme(
      legend.position = "bottom",
      text = element_text(color = "#22211d"),
      plot.background = element_rect(fill = "#ffffff", color = NA), 
      panel.background = element_rect(fill = "#ffffff", color = NA), 
      legend.background = element_rect(fill = "#ffffff", color = NA)
    ) +
    coord_fixed(ratio=1.5)
## Warning: Transformation introduced infinite values in discrete y-axis

## Warning: Transformation introduced infinite values in discrete y-axis
## Warning in sqrt(x): NaNs produced
## Warning: Removed 1399 rows containing missing values (geom_point).

Mapping data to shapes

Rather than mapping cases by prevalence in counties throughout the U.S., we are now categorizing confirmed Covid-19 cases by state. Each state will be colored as a whole according to the number of confirmed Covid-19 cases in that state regardless of whether there are counties that do not have confirmd cases of Covid-19.

daily_report <- read_csv(url("https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_daily_reports/03-27-2020.csv")) %>% 
  rename(Long = "Long_") %>% 
  filter(Country_Region == "US") %>% 
  group_by(Province_State) %>% 
  summarize(Confirmed = sum(Confirmed)) %>% 
  mutate(Province_State = tolower(Province_State))
# load the US map data
us <- map_data("state")
# We need to join the us map data with our daily report to make one data frame/tibble
state_join <- left_join(us, daily_report, by = c("region" = "Province_State"))
# plot state map
ggplot(data = us, mapping = aes(x = long, y = lat, group = group)) + 
  coord_fixed(1.3) + 
# Add data layer
  geom_polygon(data = state_join, aes(fill = Confirmed), color = "white") +
  geom_polygon(color = "black", fill = NA) +
  scale_fill_gradient(trans = "log10") +
  labs(title = "COVID-19 Confirmed Cases in the US'")

Using R color palattes

Here is an example using a different color package - Wes Anderson. there are more color palettes here: Top R Color Palettes to Know for Great Data Visualization

In this graph, we are categorizing confirmed Covid-19 cases by state as seen above, but with a different color scheme so it is easier to follow. Each state will be colored as a whole according to the number of confirmed Covid-19 cases in that state regardless of whether there are counties that do not have confirmd cases of Covid-19.

# plot state map
ggplot(data = us, mapping = aes(x = long, y = lat, group = group)) + 
  coord_fixed(1.3) + 
# Add data layer
  geom_polygon(data = state_join, aes(fill = Confirmed), color = "white") +
  geom_polygon(color = "black", fill = NA) +
  scale_fill_gradientn(colours = 
                         wes_palette("Zissou1", 100, type = "continuous"),
                         trans = "log10") +
  labs(title = "COVID-19 Confirmed Cases in the US'")

Mapping confirmed Covid-19 cases by county in the United States.

Rather than using circles to show cases in a county, the county is being filled in by a color corresponding to the number of confirmed Covid-19 cases in that county.

library(RColorBrewer)

# To display only colorblind-friendly brewer palettes, specify the option colorblindFriendly = TRUE as follow:

# display.brewer.all(colorblindFriendly = TRUE)

# Get and format the covid report data
report_03_27_2020 <-   read_csv(url("https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_daily_reports/04-02-2020.csv")) %>% 
  rename(Long = "Long_") %>% 
  unite(Key, Admin2, Province_State, sep = ".") %>% 
  group_by(Key) %>% 
  summarize(Confirmed = sum(Confirmed)) %>% 
  mutate(Key = tolower(Key))

# dim(report_03_27_2020)

# get and format the map data
us <- map_data("state")
counties <- map_data("county") %>% 
  unite(Key, subregion, region, sep = ".", remove = FALSE)

# Join the 2 tibbles
state_join <- left_join(counties, report_03_27_2020, by = c("Key"))

# sum(is.na(state_join$Confirmed))

ggplot(data = us, mapping = aes(x = long, y = lat, group = group)) + 
  coord_fixed(1.3) + 
  # Add data layer
  borders("state", colour = "black") +
  geom_polygon(data = state_join, aes(fill = Confirmed)) +
  scale_fill_gradientn(colors = brewer.pal(n = 5, name = "PuRd"),
                       breaks = c(1, 10, 100, 1000, 10000, 100000),
                       trans = "log10", na.value = "White") +
  ggtitle("Number of Confirmed Cases by US County") +
  theme_bw() 
## Warning: Transformation introduced infinite values in discrete y-axis

Confirmed corona virus cases in Massachusetts, USA

This first graph of Massachusetts is clumped by county. Each county of Massachusetts is colored different corresponding to the number of confirmed Covid-19 cases in that county of Massachusetts. In this example, Middlesex County and Suffolk County have the most cases.

daily_report <-   read_csv(url("https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_daily_reports/04-02-2020.csv")) %>% 
  rename(Long = "Long_") %>% 
  filter(Province_State == "Massachusetts") %>% 
  group_by(Admin2) %>% 
  summarize(Confirmed = sum(Confirmed)) %>% 
  mutate(Admin2 = tolower(Admin2))

us <- map_data("state")
ma_us <- subset(us, region == "massachusetts")
counties <- map_data("county")
ma_county <- subset(counties, region == "massachusetts")

state_join <- left_join(ma_county, daily_report, by = c("subregion" = "Admin2")) 

# plot state map
ggplot(data = ma_county, mapping = aes(x = long, y = lat, group = group)) + 
  coord_fixed(1.3) + 
# Add data layer
  geom_polygon(data = state_join, aes(fill = Confirmed), color = "white") +
    scale_fill_gradientn(colors = brewer.pal(n = 5, name = "BuGn"),
                         trans = "log10") +
  labs(title = "COVID-19 Confirmed Cases in Massachusetts'")

This graph is mapping the counties the same as above but with a different color scheme.

daily_report <-   read_csv(url("https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_daily_reports/03-27-2020.csv")) %>% 
  rename(Long = "Long_") %>% 
  filter(Province_State == "Massachusetts") %>% 
  group_by(Admin2) %>% 
  summarize(Confirmed = sum(Confirmed)) %>% 
  mutate(Admin2 = tolower(Admin2))
us <- map_data("state")
ma_us <- subset(us, region == "massachusetts")
counties <- map_data("county")
ma_county <- subset(counties, region == "massachusetts")
state_join <- left_join(ma_county, daily_report, by = c("subregion" = "Admin2")) 
# plot state map
ggplot(data = ma_county, mapping = aes(x = long, y = lat, group = group)) + 
  coord_fixed(1.3) + 
# Add data layer
  geom_polygon(data = state_join, aes(fill = Confirmed), color = "white") +
    scale_fill_gradientn(colours = 
                         wes_palette("Zissou1", 100, type = "continuous"),
                         trans = "log10") +
  labs(title = "COVID-19 Confirmed Cases in Massachusetts")
## Warning: Transformation introduced infinite values in discrete y-axis

This graph is mapping the same as above but with a different scale and borders surrounding each county and with this map, if you hold your cursor over the county, you will be told how many confirmed cases there are in that county. In this map, it shows that Middlesex County has more confirmed cases than Suffolk County.

daily_report
## # A tibble: 16 x 2
##    Admin2              Confirmed
##    <chr>                   <dbl>
##  1 barnstable                100
##  2 berkshire                 105
##  3 bristol                   129
##  4 dukes                       0
##  5 dukes and nantucket         4
##  6 essex                     350
##  7 franklin                   24
##  8 hampden                    90
##  9 hampshire                  20
## 10 middlesex                 685
## 11 nantucket                   0
## 12 norfolk                   393
## 13 plymouth                  187
## 14 suffolk                   631
## 15 unassigned                303
## 16 worcester                 219

An interactive graph of confirmed Covid-19 cases in Massachusetts, USA

library(plotly)
ggplotly(
  ggplot(data = ma_county, mapping = aes(x = long, y = lat, group = group)) + 
  coord_fixed(1.3) + 
# Add data layer
  geom_polygon(data = state_join, aes(fill = Confirmed), color = "black") +
    scale_fill_gradientn(colours = 
                         wes_palette("Zissou1", 100, type = "continuous")) +
  ggtitle("COVID-19 Cases in MA") +
# Cleaning up the graph
  labs(x=NULL, y=NULL) +
  theme(panel.border = element_blank()) +
  theme(panel.background = element_blank()) +
  theme(axis.ticks = element_blank()) +
  theme(axis.text = element_blank())
)

Preparing the times series data

The time series data is ripe for animation but first we need to get and format the files.

time_series_confirmed_long <- read_csv(url("https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv")) %>%
  rename(Province_State = "Province/State", Country_Region = "Country/Region")  %>% 
               pivot_longer(-c(Province_State, Country_Region, Lat, Long),
                            names_to = "Date", values_to = "Confirmed") 
# Let's get the times series data for deaths
time_series_deaths_long <- read_csv(url("https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_deaths_global.csv")) %>%
  rename(Province_State = "Province/State", Country_Region = "Country/Region")  %>% 
  pivot_longer(-c(Province_State, Country_Region, Lat, Long),
               names_to = "Date", values_to = "Deaths")
time_series_recovered_long <- read_csv(url("https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_recovered_global.csv")) %>%
  rename(Province_State = "Province/State", Country_Region = "Country/Region") %>% 
  pivot_longer(-c(Province_State, Country_Region, Lat, Long),
               names_to = "Date", values_to = "Recovered")
# Create Keys 
time_series_confirmed_long <- time_series_confirmed_long %>% 
  unite(Key, Province_State, Country_Region, Date, sep = ".", remove = FALSE)
time_series_deaths_long <- time_series_deaths_long %>% 
  unite(Key, Province_State, Country_Region, Date, sep = ".") %>% 
  select(Key, Deaths)
time_series_recovered_long <- time_series_recovered_long %>% 
  unite(Key, Province_State, Country_Region, Date, sep = ".") %>% 
  select(Key, Recovered)
# Join tables
time_series_long_joined <- full_join(time_series_confirmed_long,
              time_series_deaths_long, by = c("Key"))
time_series_long_joined <- full_join(time_series_long_joined,
              time_series_recovered_long, by = c("Key")) %>% 
    select(-Key)
# Reformat the data
time_series_long_joined$Date <- mdy(time_series_long_joined$Date)
# Create Report table with counts
time_series_long_joined_counts <- time_series_long_joined %>% 
  pivot_longer(-c(Province_State, Country_Region, Lat, Long, Date),
               names_to = "Report_Type", values_to = "Counts")

Creating the animations

Below are the packages I installed. There may be others that you need, in particular to rendering gifs. Some of the example may take several minutes to create the animation.

library(ggplot2)
library(gganimate)
## Warning: package 'gganimate' was built under R version 3.6.3
library(gifski)
## Warning: package 'gifski' was built under R version 3.6.3
library(av)
## Warning: package 'av' was built under R version 3.6.3
library(transformr)
## Warning: package 'transformr' was built under R version 3.6.3
theme_set(theme_bw())

Animation of confirmed Covid-19 cases in China, South Korea, Japan, and the USA over a period of time

data_time <- time_series_long_joined %>% 
    group_by(Country_Region,Date) %>% 
    summarise_at(c("Confirmed", "Deaths", "Recovered"), sum) %>% 
    filter (Country_Region %in% c("China","Korea, South","Japan","US")) 
p <- ggplot(data_time, aes(x = Date,  y = Confirmed, color = Country_Region)) + 
      geom_point() +
      geom_line() +
      ggtitle("Confirmed COVID-19 Cases") +
      geom_point(aes(group = seq_along(Date))) +
      transition_reveal(Date) 
    
animate(p,renderer = gifski_renderer(), end_pause = 15)

Animation for Prof. Chris Sunderland’s example

This is an animation showing the number of confirmed Covid-19 cases across the world as each day of the pandemic passes.

covid <- read_csv(url("https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv")) %>%
           rename(Province_State= "Province/State", Country_Region = "Country/Region") %>%
           pivot_longer(-c(Province_State, Country_Region, Lat, Long),
                  names_to = "Date", values_to = "Confirmed") %>%
           mutate(Date = mdy(Date) - days(1),
                  Place = paste(Lat,Long,sep="_")) %>%
# Summarizes state and province information
             group_by(Place,Date) %>%
           summarise(cumulative_cases = ifelse(sum(Confirmed)>0,
                     sum(Confirmed),NA_real_),
                     Lat = mean(Lat),
                     Long = mean(Long)) %>%
           mutate(Pandemic_day = as.numeric(Date - min(Date)))
world <- ggplot(covid,aes(x = Long, y = Lat, size = cumulative_cases/1000)) +
                 borders("world", colour = "gray50", fill = "grey90") +
                 theme_bw() +
                 geom_point(color='purple', alpha = .5) +
                 labs(title = 'Pandemic Day: {frame}',x = '', y = '',
                      size="Cases (x1000))") +
                 theme(legend.position = "right") +
                 coord_fixed(ratio=1.3)+
                 transition_time(Date) +
                 enter_fade()
animate(world,renderer = gifski_renderer(), end_pause = 30)
## Warning: Removed 229 rows containing missing values (geom_point).

## Warning: Removed 229 rows containing missing values (geom_point).
## Warning: Removed 221 rows containing missing values (geom_point).
## Warning: Removed 219 rows containing missing values (geom_point).
## Warning: Removed 216 rows containing missing values (geom_point).
## Warning: Removed 213 rows containing missing values (geom_point).
## Warning: Removed 419 rows containing missing values (geom_point).
## Warning: Removed 206 rows containing missing values (geom_point).
## Warning: Removed 203 rows containing missing values (geom_point).
## Warning: Removed 199 rows containing missing values (geom_point).
## Warning: Removed 197 rows containing missing values (geom_point).

## Warning: Removed 197 rows containing missing values (geom_point).

## Warning: Removed 197 rows containing missing values (geom_point).
## Warning: Removed 196 rows containing missing values (geom_point).

## Warning: Removed 196 rows containing missing values (geom_point).

## Warning: Removed 196 rows containing missing values (geom_point).
## Warning: Removed 390 rows containing missing values (geom_point).
## Warning: Removed 195 rows containing missing values (geom_point).

## Warning: Removed 195 rows containing missing values (geom_point).

## Warning: Removed 195 rows containing missing values (geom_point).

## Warning: Removed 195 rows containing missing values (geom_point).

## Warning: Removed 195 rows containing missing values (geom_point).
## Warning: Removed 194 rows containing missing values (geom_point).

## Warning: Removed 194 rows containing missing values (geom_point).

## Warning: Removed 194 rows containing missing values (geom_point).

## Warning: Removed 194 rows containing missing values (geom_point).
## Warning: Removed 387 rows containing missing values (geom_point).
## Warning: Removed 193 rows containing missing values (geom_point).
## Warning: Removed 191 rows containing missing values (geom_point).

## Warning: Removed 191 rows containing missing values (geom_point).

## Warning: Removed 191 rows containing missing values (geom_point).
## Warning: Removed 186 rows containing missing values (geom_point).
## Warning: Removed 182 rows containing missing values (geom_point).
## Warning: Removed 175 rows containing missing values (geom_point).
## Warning: Removed 171 rows containing missing values (geom_point).
## Warning: Removed 164 rows containing missing values (geom_point).
## Warning: Removed 313 rows containing missing values (geom_point).
## Warning: Removed 146 rows containing missing values (geom_point).
## Warning: Removed 142 rows containing missing values (geom_point).
## Warning: Removed 134 rows containing missing values (geom_point).
## Warning: Removed 130 rows containing missing values (geom_point).
## Warning: Removed 121 rows containing missing values (geom_point).
## Warning: Removed 118 rows containing missing values (geom_point).
## Warning: Removed 113 rows containing missing values (geom_point).
## Warning: Removed 109 rows containing missing values (geom_point).
## Warning: Removed 200 rows containing missing values (geom_point).
## Warning: Removed 94 rows containing missing values (geom_point).
## Warning: Removed 80 rows containing missing values (geom_point).
## Warning: Removed 65 rows containing missing values (geom_point).
## Warning: Removed 59 rows containing missing values (geom_point).
## Warning: Removed 52 rows containing missing values (geom_point).
## Warning: Removed 49 rows containing missing values (geom_point).
## Warning: Removed 44 rows containing missing values (geom_point).
## Warning: Removed 38 rows containing missing values (geom_point).
## Warning: Removed 29 rows containing missing values (geom_point).
## Warning: Removed 49 rows containing missing values (geom_point).
## Warning: Removed 22 rows containing missing values (geom_point).
## Warning: Removed 20 rows containing missing values (geom_point).
## Warning: Removed 17 rows containing missing values (geom_point).
## Warning: Removed 14 rows containing missing values (geom_point).
## Warning: Removed 13 rows containing missing values (geom_point).
## Warning: Removed 10 rows containing missing values (geom_point).

## Warning: Removed 10 rows containing missing values (geom_point).
## Warning: Removed 9 rows containing missing values (geom_point).
## Warning: Removed 7 rows containing missing values (geom_point).
## Warning: Removed 12 rows containing missing values (geom_point).
## Warning: Removed 5 rows containing missing values (geom_point).
## Warning: Removed 4 rows containing missing values (geom_point).
## Warning: Removed 1 rows containing missing values (geom_point).

References 1. https://cran.r-project.org/web/packages/maps/maps.pdf by Eric Anderson 2. https://ggplot2.tidyverse.org/reference/geom_map.html 3. https://www.r-spatial.org/r/2018/10/25/ggplot2-sf.html by Mel Moreno and Mathieu Basille 4. Data from https://github.com/CSSEGISandData/COVID-19 5. Examples from Prof. Chris Sutherland at UMass Amherst 6. Examples from Anisa Dhana 7. Examples using Wes Anderson color package 8. https://gganimate.com/articles/gganimate.html 9. https://www.datanovia.com/en/blog/gganimate-how-to-create-plots-with-beautiful-animation-in-r/